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ABSTRACT 

Livingston's reliability coefficients and Harris" 
indices of efficiency were computed along with the classical internal 
consistency coef f icients ^ KR-20*s (Kuder-Richar dson internal 
consistency coefficient) ^ for 678 criterion-referenced tests in the A 
through E levels of an individualized mathematics program. The 
coefficients were carefully studied and compared with each other in 
relation to the number of students^ the number of items^ the 
percentage points of the mastery criterion score and the mean^ the 
absolute value of difference of the mean from the mastery criterion 
score expressed both as a percentage and in a standard score form^ 
the standard deviation^ the proportion of mastery students^ the shape 
of the score distribution^ and the mastery status indices derived 
from the cross-tabulated tables of students • performance on the 
pretest and the Curriculum Embedded Test (GET) ^ the pretest and the 
posttestr and the CET and the posttest. (Author/RC) 
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ABSTRACT 



Livingstones reliability coefficients and Harris' Indices of efficiency 
were computed along with the classical internal consistency coefficients, 
KR-20*s, for 678 criterion-referenced tests in the A through E levels of 
IPI Mathematics, Edition II. The coefficients were carefully studied and 
compared with each otl'.;>r in relation to the number of students, the number 
of items, the percentage points of the mastery criterion score and the 
mean, the absolute value of difference of the mean from the mastery 
criterion score expressed both as a percentage and in a standard score 
form, the standard deviation, the proportion of mastery students, the 
shape of the score distribution, and the mastery status indices derived 
from the cross-tabulated tables of students' performance on the pretest 
and the Curriculum _Embedded J_est (CET) , the pretest and the posttest, and 
the CET and the posttest. 



INTRODUCTION 



'IVo procedures have recently been proposed for the estimation of the 
reliability of a criterion-referenced test from total test scores. 

Livingston (1970) derived a reliabili*^y coefficient for a criterion- 
referenced test by redefining the variance as a deviation from the mastery 
criterion score rather than from the mean score as it is in the sense of 
classical test theory. He showed the relation between the c 1 nssica 1 re 1 lab L 1 i ty 
coefficient and his reliability coefficient for criterion-referenced test, 
K-(X,T), as: 

^ (X,T) - (X,T) ^ (m^ - C)-^ 

0^- + (y^ - C)2 

where p"^* (X,T) is a classical reliability coefficient, Ox"^' is the test variance, 
Mx is the test mean and C is the mastery criterion score. 

Livingstones proposal has been subjected to a substantial amount of 
critical, analysis: Fiambleton and Novick, 1972; Shavelson, Block, and Ravitch, 
1972; Fiarris, 1972-a; and Raju, 1973. The primary criticism within these 
analyses centered around the inclusion of the (u-c)'''- term. Specifically, 
Shavelson, Block and Ravitch (1972) observed that the term (y-c)^ dominates 
in deciding k"(X,T) for the criterion-referenced test where the test variance 
is relatively small. Hambleton and Novick (1972) indicated . that Livingston's 
coefficient misses the essential point of criterion-referenced testing, and 
that the critical problemis one of deciding whether a student's true score 
is above or below the mastery criterion score, not one of showing how far his 
obtained score departs from the criterion score. Harris (1972-a) and Raju (1973) 
independently derived the same formula through the utilization of the two groups 
approach, under different assumptions, and concluded that Livingston's coeffi*- 
cient was impractical and unreasonable because it seemed to hardly meet their 
assumptions. In addition, Harris (1972-a) also stated that "although 
Livingston's reliability coefficient is generally larger than the conventional 
one, the standard error of measurement (which gives more meaningful information 
in deciding whether the student has a true score below or above a certain mastery 
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criterion «ooru) is the same." 

At the L972 AERA Meetin^^ in Chicago, Harris (l972-b) proposed his 
index of efficiency: 



PC- = 




(2) 



SSb + ^s, 



where SSb ^^"^^^ SS^ denote the between- and within-groiip suols of squares that: 

are determined by the two groups resulting from the dichotomic: at ion into mastery 



the correlation between the dummy variable that designates* the group (mastery 
or non-mastery) and the total test score. Therefore, it does not depend upon 
the number of items. In this sense, it differs from conventional coefficients 
which increase as the number of items increases. It is, however, similar to 
them in dropping to 0.00 when all or none of the tested students belong to the 
mastery group. In addition, the index becoriTes 1,00 when the following condi- 
tions are satisfied: (1) the students are divided into mastery and non-mastery 
groups, and (2) the within-group variance is equal to zero. As an extreme 
case, the index is 0.00 when all the students achieve above the mastery cri- 
terion score. It changes to 1.00 when even a student misses one item on a 5 
item test which has 100% correct response as the mastery criterion score. 
Marshall (1973) made an intensive study on the behaviors of Harris* index with 
simulated data. Among his findings that relate to the present study are: (1) 
the index is not affected significantly by either the number of subjects or by 
the number of items, (2) the index is affected by changes in the criterion; the 
higher the criterion, the lower the index, except when the total scores are all 
close to the number of items, in which case the trend is reversed, (3) the index 
increases as the range of competence increases for a given category of input 
competence vector> (4) the index decreases when the unaccounted for error vari- 
ance increases, except when total scores are for the most part well above the 
criterion level, and (5) the index is generally higher as the mean of the test 
increases, for a given criterion level, unless the total score distribution is 
high in the extreme. 

The present study intends (1) to investigate the behaviors of the two 
coefficients and the conventional reliability coefficient (KR-20) computed on 



and non-mastery categories. Technically, his index of efficiency represents 
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the basLs of real data that were collected from three I.P.I. Mathematics 
Kclltioii ri" field tOvSt: schools in relation to the number of students (N) , Llie 
iiumbc»r of I tems (K) , the percentage point of the mastery criterion score (Pc) 
and the mean (Px), t'.ho ab.solute difference of the mean from the mastery 
criterion score expressed in percent (|Px - Pc|) and in a standard score foi'm^ 
the standard deviation (SD) , the percent of laastery students (Pm) , 
the test type (Pretest, Curriculum Embedded Xest, and Posttest), •>nd the sha|>c» 
of the score diKtribution (normal, J-shaped, L-shaped, rectangular, etc.); (2) 
to compare the average si^^.e of the two coefficients for each Level of the fac- 
tors mentioned in (1); and (3) to study the relation of the two coefficients 
to the mastery status Indices derived from cross-tabulated tables of students* 
performances on pretest and GET, pretest and posttest, and ClilT and posttest. 

Ll is hoped that the present study will yield useful, significant infurma- 
cion which might aid the development of theory and the improvement of practice 
in criterion-referenced testing, 

DATA, METHODS AND PROCEDURI': 

The data used in the present study were collected from three IPI Mathemacics, 
Edition 11 field test schools in 1971-72 school year. The IPI Mathematics, 
Edition II is a new version of IPI Mathematics which was originally developed 
by Learning Research and Development Center of University of Pittsburgh, revised 
by Research for Better Schools, and published by' Appleton-Century-Crof ts . It 
covers K-6 contemporary mathematics content wiiich is divided into 10 content 
areas; Nvimeration and Place Value, Addition and Subtrac tion , Multiplication , 
Division, Fractions, Money, Time, Systems of Measurement, Geometry, and Applica- 
tions. Instructional objectives in each content area are grouped into several 
levels (mostly Level A through Level G) . 

Hie student who is placed in an appropriate level on the basis of his or 
her placement test score takes the pretest which consists of items designed to 
measure the terminal behavior(s) of each objective in the unit. The student 
begins his study with the lowest numbered skill in the unit on which he did 
not demonstrate mastery on the pretest. Right after the lesson, the student 
takes the Curriculum Embedded Test (GET). If the student shows mastery on the 
CETy he then moves to the next unmastered skill. When the student completes 
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all of the uiunastered skills in the unit , he then takes , the anit posttost. 
Therefore, the CKT's can be regarded as immediate posttests and the posttc.: ! 
as delayed ones. These tests were administered on an individual basis. Con- 
sequently the number of students who took the test varies from test to Lost. 

A computer program named SC0RHWT3 was specially developed for tho purpose 
of this study. It provides the user with a score distribution, mean, median, 
standard deviation, coefficient alpha of which KR~20 is a special case, and 
hivingstonVs coefficient and the proportion of mastery students when a mastery 
criterion, .score, C, is specified. It also gives Harris* Index of Kfficlency, 
/ic, and /ic- for each of available score points in the score distribution upon 
user's request. 

llius far, 274 A~E level pretes ts , 209 A-D level CKT's, and 212 A-D level 
posttests have been analyzed. Nine pretests, one GET and seven posttests were 
not used as data because they were one-item tests. The actual number of tests 
that constitute the data of the present study is presented in Table 1. 



Table 1. Number of Test Data 



Level 


Tes t 










CONTENT ARliA 










— 

TOTAL 


Type 


iN'/PV 


A/S 


Mult. 


Div. 


Fract . 


Money 


Time 


SOM 


Geom. 


Appl. 




i're 


I 3* 


16* 






3 


1 


0* 








33 


A 


CK'I' 


13 


17 






3 


1 


0 








34 




I'os L 


13* 


16* 






3 


0 


0 








32 




Pre 


6* 


12 


4 


3^ 


3 


1 


1* 


3 


3 


3 


39 


B 


GET 


7* 


12 


4 




3 


1 


1 


3 


3 


3 


40 




Post 


6* 


12 


4 


3 


3 


0 


1* 


3 


3 


3 


38 . 




Pre 




13 


7 


4 


6 


5 


6 


6 


1 


8 


70 


C 


GET 


13 


13 


7 


4 


6 


5 


5 


6 


1 


8 


68 




Post 


14 


13 


7 


4 


6 


5 


6 


6 


0 


8 


69 




Pre 


5 


10 


9 


7 


7 


5 


4 


6 


4 


9 


66 


D 


GET 


5 


10 


9 ■ 


7 


7 


5 


4 


6 


4 


9 


66 




Pre 


5 


10 


9 


. 7 


7 


5 


4 


6 


4 


. 9 


66 


E 


Pre 


6 


4 


7 


9 ■ 


11 




4 


5 


6 


5 


57 




Pre 


44 


55 


27 


23 


30 


12 


15 


20 


14 


25 


265 


TOTAL 


GET 


38 


52 


20 


14 


19 


12 


10 


15 


8 


20 


208 




Post 


38 


51 


20 


14 


19 


10 


11 


15 


7 


20 


205 



* One, two or tliree one-item tests were excluded from the unit. 
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Tlui t.tist consis toncy laclex and/or the ofCLclency index of tnsl riirt: i on 
wtiro. derLved from lluj results of Iho cross-tabu Lit Ion of two test scores ns 
fo 1 lows : 
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KIKST TKS' 





Non-mastt:ry 


Mns (.e ry 


Mastery 


Pnm-m ^ 


Pm^m 


Non-mastery 


Pnm-nm 


Pm-nm 



Pre-CET 



'V The P's in the table represent the percentage. 
= Pm-m + Pnm-m -Pm-nm (3) 



Pre-Post 



t 



GET- Post 



Pm-m + Pnm-m 
Pm-m - Pm-nm 



Pm-nm - Pnm-nm 



(4) 
(5) 



All reliability and other information for a test were recorded on n 
standard optical scanning sheet from which the data card was. punched. Since 
it was impossible to make a negative sign on the standard optical scanning 
sheet, the negative values of KR-20's and Livingston's coefficients were 
recorded as O's.. 

Correlations were computed by BMDOBD- (Dixon , 1970) for pretest, GET, 
and posttest data .separately and then for the combined total test data. 

Data, were grouped into 2-4 categories according to the frequency 

listing of the number of cases (N) , number of items (K) , percentage points of 

mastery criterion score (Pc) and mean (Px) , the difference between the mean 

and the mastery criterion score expressed in both percentage (|Px - Pc|), and 

standard score form (|x - c|/SD), standard deviation (SD) , the proportion of 

mastery students (Pm) , and the shape of score distribution (SSD) . Then nine 

two-factor multivariate analyses of variance were performed in order to compare 

2 

the magnitudes of KR-20's, Livingston's coefficients, and Harris' maximum /ic 's 
and juc's. The first three-level factor was the same for all MANOVA's: test 
type; pretest, GET and posttest. The second factor in each of the MAN OVA ' s 
consisted of one of the above mentioned variables blocked into two to four 
categories. The dependent measures in each MANOVA were the four coefficients; 
KR-20, K.^(X,T), maximum ^c^, and /xc. MANOVA was used in order to perform 4 
ANOVA's at the same time. Prior to MANOVA, KR-20's, Livingston's coefficients 
and Harris' indices were transformed into Fisher's Z's, and Harris' /ac^-'s were 
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ttoiwtTlt.ul Into I'adLans by arcslne trans formaLion fol lowliif^ Mdw.-ints ' (1908) 
rucumimM\(tat: I t)iis . 

Only Chu ro.s\itls uf i:ho cor ro 1 aL 1 oiia 1 sliuty ari^ r(»porli'Ml In l.hls paper, 
'y\w. r».vsiillK ol: M\Ni)VA wi.Ll ho presc.nLod In a scj^araLc pap^T* 

R1':SULTS 

The cross tabulation results revealed that the distributions of KR-~20Vs, 
Livingston's coefficients, and Harris* indices were quite different for the 
pretests, CET's and posttest's ( = 156.38 with 20 d.f.s for x^' ^127.47 

with 20 d.f.s for (X,T) , and 48.14 with 14 d.f.s for Mc's). Generally 

pretest coefficients showed negatively skewed distribution with fewer extremcd 
values (such as 0.0 and 1.00). The distributions of CET's and posttest.s wore 
less skewed than that of pretests, but there were more extreme values, especially 
0.0 values. 

Tlie correlation of test type (value 1 was assij^^ned to pretests, 2 to CIlT's " 
and 3 to posttests) with KR-20, k2(X,T) and ^c were -.27, -.26 and --04, 
respectively, with the first two coefficients being significant at the .OJ. 
level. The -:.04 value was not significant. The difference between the last 
two coefficients was statistically significant at the .01 level when llotelling's 
t-test (Walker & Lev, 1953, 259-260) was applied (t = 5.25), T\\q results imply 
that larger KR-20 and K^(X,T) coefficients are obtainable when a CRT is used 
as a pretest. Meanwhile, ^c does not change much along with the shift in test- 
type. The results seem quite rensonabJe if the fact that greater test variance 
may be expected when a test is used as a pretest than when used as a GET pr as 
a posttest is taken into consideration, and also that the ^ic does not have any 
relation with the variance. Therefore, further analyses were carried out for 
pretests, CRT's and posttests separately hereafter. 

A. Means, Standard Deviations and Intercorrelations of the Tliree Reliability 
Coefficients. 

Table 2 presents the means and standard deviations of kR~20, k2(x,T) and 
^c for pretests, CET's, posttests and for the combined total test data. 
The significance of mean difference between (X,T) and was tested by 
using the t-test technique for paired observations (Walker & Lev. J,953, 
151-154). 
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TfVBLK 2. Means and Standard Deviations of the lliree. 
Reliability Coefficients 



Tost Type 


. KR-20 


K-'(X,T) 


MC 


t 


Pretest 
(N=265) 


Ma an 
SD 


.730 
.237 


.822 
.188 


.835 
.104 


.92 


Cl'T 

(N=208) 


Mean 
SD 


.415 
.299 


.628 
.270 


. 755 
.295 


1.9?.** 


Posttest 
(N=205) 


Mean 
SD 


. 542 
. 309 


.673 
. 259 


.818 
. 213 


5.37** 


TOTAL 
(N=678) 


Mean 
SD 


.577 
. 309 


.717 
.252 


.805 
.214 


6.22** 



Significant at the .01 level. 



'I'AHLK 3. Intercorre l.at Ions 



Pretest 




KR-20 k2(X,T) 


K^(X,T) 

uc 


.838 

.124 -.164 


Posttest 




KR-20 K^(X,T) 


K^(X,T). 

uc 


.684 

.318 -.339 



o 

ERIC 



GET 





KR-20 




K^(X,T) 


.505 




uc 


.445 


364 


Total 




KR~20 


K^(X,T) 


k2(x,T) 


.702 




uc 


.359 


-.246 



(7) 



On Lhti average, the ;jc mean was higher than the K-(X,T) moan for ail 
the praiost, CKT, and posttest cases. The wean dlEference was significant 
at tho .01 Icvcil for the GET, posttest, and the combined data. The moan 
diffci . :.CG for protest was not statistically significant, but the standard 
deviation of ;.ic'« wiui considerably smaller than that of k2(X,T), As was 
expected, the mean of K"(X,T) was always higher than that of KK-20 for all 
test types. KR-20 had the largest standard deviation among the tliree co- 
efficients for all tost types • 

Inter correlations between two of the tliree reliability coefficients 
are presented in Table 3. All correlation coefficients are statistically 
significant at the #01 level except for the correlation between KR^20 and 
^c based on the pretest data which is significant at the .05 level. Hie 
KR-20 and K*-(X,T) coefficients derived from the pretests were very highly 
correlated which seems to imply that the pretest situation is quite similar 
to a classical testing situation, insofar as those coefficients are concerned 
It is worthwhile to notice that the two reliability coefficients for a 
cri terion^referenced test are negatively corre lated across all of the 
test types. « 

Influence of Related Variables on the Three Reliability Coefficients 

It is very difficult to single out' the effects of any one variabU». 
on the three reliability coefficients, because they all hav more than 
two terms in their respective computational formulae and each variable 
is interdependent with many other variables and conditions. In this sec- 
tion, the zero-order correlations of the three coefficients with selected 

variables are presented, the significance of the difference in the corrcla- 

9 

tions of a studied variable with K (X,T) and ^c is tested and possible 
relations with the other variables are discussed. The significance of the 
difference was tested by using Hotelling's method (Walker and Lev, 1953, 
258-259). 

1. Number of Cases (N) 

Table 4 presents the correlations of the three coefficients with 
the number of cases (the number of students who took the test). • 
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^ 4. Correlations of .the Number of Cases with the Three 
Coefficients 



Test Type 


If ol 


flses 


Correlations wl^h 


L 


Me an 


SD 


KR-20 


k2(X.T) 


uc 


Pretest 
GET 

Posttest 


163.26 
80.25 
109. AO 


106.91 
58.20 
73.90 


. 10 
. 37** 
.15* 


.09 
.05 
-.01 


.10 
. 33** 
. 22** 


.08 
2 . 59** • 
2.00* 


TOTAL 


121.51 


91.59 


. 31** 


.17** 


. 22** 


1.18 



* significant at the .05 level. 



* Significant at the .01 level. 

In general, aiJ three reliabLlity coefficients had positive 
relationships with the number of cases for the combined total test 
data. The classical reliability coefficients was mostly highly 
correlated with the number of cases as expected. Both differences 
of correlation coefficient of N with KR-20 from those of N with the 
other reliability coefficients were significant at the .01 (t = A, 93) 
and .05 (t => 2,21) level, respectively, whereas^ the difft^rc^nco 
between the latter two coefficients was not stalisllc/illy s l^;n i f i* can t . 
The number of students did not show any significant relations with 
tl\e three reliability coefficients, when the calculations wore based 
on the pretest data. 

The KR-20 and >ic, however, are significantly related to the \ 
number of students involved when the correlations were derived from 
GET or posttest data. However, the correlation between k2(X,T) and 
number of cases was not statistically significant for CET's, or for 
post tests. Consequently the difference between ^jyj_|^2 T) ^^^^ N->ic 

was significant at the .01 level for the GET case and significant at 
the .05 level for the posttest case. 

. Grosstabulation results showed that both KR-20 and )ic had distri- 
butions of L~shape or extremely positively skewed U-shapes when the 
number of cases was less than 30. As the number of cases Increased, 
the shape of the KR-20 distributions gradually shifted from the positive 
to the negatively skewed, while the shape of the distributions 
rapidly shifted from the positive to the negatively skewed. 

. (9) ' 



In short, the above findings imply that Livingston's coefficients 
are not significantly related to the number of cases, while, the 
classical internal consistancy coefficient and Harris' index of 
efficiency are positively correlated with the number of cases. These 
relationships occured when the tests were administered as posttests 
(either as immediate or as delayed posttests). 

2. Number of Items. (K) 

It is well known that KR-20 increases as the number of Itnms 
increases, especially when the items are homogeneous. LivLngston' s 
coefficient is expected to have similar relationship with the number 
of items as KR-20 has because it has KR-20 as a term. Harris' Index 
supposedly does not have any relationship with the number of items. 
The correlation coefficients of the number of items with the three 
reliability coefficients are presented in Table 5. 

Table 5. Correlations of the Number of Items with the Three Reliability 
Coefficients 





// of- 


Items 


Correlations with 




Test Type 


Mean 


"SD 


KR-20 


k2(X,T) 


gc 


t 


Pretest 


6.22 




5.94 


.18** 


.11 


-.11 


2.43* 


GET 


6.72 




6.60 


. 20** 


.23** 


-.13* 


3.17** 


Posttest 


6.24 




6.65 


.20** 


. 19** 


-.13* 


2.79** 


TOTAL 


6.38 


6.36 


. 16** 


.16** 


-.12** 


4.72** 



* significant at the .05 level. 
** Significant at the .01 level. 




As was expected, KR-20 evidenced a moderate positive relation 
with the number of items for all test types. K'^(X/r) had posLtivt^ 
relations with the number of items, even though the correlation 
coefficient for pretests was not statistically significant. Interest-; 
lrigly> ^c had negative correlations with the number of items, and the 
correlation coefficient for the pretest data was also not statistically 
significant. Consequently the differences between the correlations 
of the number of items with K'^(X,T) and with ^ic were significant at , 

(10) . : 



.the ,05 level for pretests and at the ,01 level for the other 
tests antl for the c.oinbined total test data. Orosstabulallon 
oT K wLlh shows that: (Mimpiil lag /ic was adcqualo whcMi R If) or 
a I mnMl, P). 

*.\, lUjfcont ToLul of MuHtery Clrlto.rlcMi Scort'. {\\\) 

Mas tery criterion score for a tes t was decided on Ihe has is 
ot; complexity of the skill and the number of items in the test. 
Generally, one hundred percent correct was regarded as mastery Tor 
a test with less than five items. Lower percent correct were required 
for tests designed to measure complex skills. Therefore, there is 
no theoretical basis to expect any relationship between Pc and KR-20", 
between Pc and K^(X,T), or between Pc and ;jc. 

Table 6. Correlations of the Percent Point of Mastery Criterion Score 
with the Three Reliability Coefficients 



Test Type 


Pc 


Correlations with 


t 


Mean 


SD 


KR-20 


k2(X,T) 


uc 


Pretest 


91.28 


7.85 


-.18** 


-.05 


.02 




74 


CET 


91.30 


7.51 


.00 


-.30** 


.35** 


6. 


19** 


Post test 


92.40 


7.56 


-.13* 


-.19** 


.12* 


2 


75** 


TOTAL 


91.62. 


7.66 


-.10* 


-.18** 


.19** 


6 


20** 



* Significant at the .05 level, 
** Significant at the .01 level. 



Table 6 shows that Pc was negatively correlated with KR-20 and 
K^(X/r), and positively correlated with ;jc. The correlations of Pc 
with k2(X,T) and for pretests were not statistically significant. 
The obtained correlations of pc with Pc seem to support the second 
part of Marshall's (1973) finding that the index is affected by changes 
in the criterion; the higher the criterion, the higher the index, when 
the total scores are all close to the number of items. Almost all CET' 
and most of the post tests were in this case.. 

A. Percent Point of the Mean o' 

When the percent point of the mean approaches an extreme value 
(0 or 100 percent), the result is a reduction in the test variance, 



and a concomitant decrease of KR-20. Table 7 shows the decreasing 
trend well. 



Table 7. Correlation Coefficients of the Percent Point of 
Mean with the Three Reliability Coefficients 



Teat Type 


Px 


Correlations wit 


1 


t 


Mean 


SD 


KR-20 


k2(X.T) 


uc 


Pretest 


67.86 


22.78 


-.34** 


-.50** 


.04 


6.02** 


CET 


93.51 


5.09 


-. 24** 


.17** 


-.28** 


4.04** 


Post test 


90.87 


6.72 


-.44** 


-.20** 


-.22** 


.21 


TOTAL 


82.68 


19.14 


-.44** 


-.39** 


-.12** 


4.96** 



* Significant at the .05 level. 
** Significant at the .01 level. 



The relationship of Px with K^(X,T) was inconsistent because of 
the fact that an increase in PX effects in two ways two of the most 
important terms used in determining K^(X,T) from classical relia-- 
bility coefficients; namely the standard deviation and (;jc-C)2, 
Considering the pretests, where most test means were below the mastery 
criterion score, an increase in the mean resulted in the reduction of 
both the test variance and the (jx-c)^ value. The same reasoning may 
be applied to the posttest case because the mean of Pc was higher than 
the mean of Px for posttests. For CET's of which the mean of Px was 
higher than the mean of Pc, however , the . increase in the mean results 
in an increase of the (^-c)^^? term which contributes more than test 
variance in determining k2(X,T) for CET where the test variance is 
usually small. 

There were significant negative correlations between Px and^c 
for CET's and for posttests. There were two, nine, and seven 100 
percent mastery cases for which the values of ^c were zeros in pre- 
tests, CET's and posttests, respectively. It is hard to believe, 
however, that these extreme cases were the sole reasons for the 
negative correlations for the CET's and the posttests. In this re- 
gard, the present results do not agree with Marshall's findings that* 
the index is generally higher as the mean of the test increases for a 
given criterion level. 
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5. Difference Between the Mean and. the Mastery Criterion Score (|Px - Pc | 
and |(X - C)/Sd1) 

As indirectly suggested in the previous discussions of l*c 
and Px, the difference between the mean and the masL(»ry crlKirlou 
score has a close relationship with the magnitude of K^(X/J'). 
'fables 8 and 9 present the relationships. 



Table 8. Correlations of. the Difference between the Mean and the 
Mastery-Criterion Score Expressed in Percentage with the 
Three Reliability Coefficients 



Test Type 




P5t - 


Pc 




Correlations with 


t 


Mean 


SD 


KR-20 


k2(X,T) 


pc 


Pretest 


25.03 


22.44 


.25** 




.48** 


-.06 


6.4 3** 


CET 


7.65 




5.09 


-.08 




.42** 


-. 32** 


7.23** 


Posttest 


7.45 




6.06 


.14* 




.25** 


.09 


1.46 ■ 


TOTAL 


14.39 


16.99 


. 31** 


.42** 


,01 ■ 


7. sn** 



* Significant at the .05 level. 
** Significant at the .01 level. 



Table 9. Correlations of the Difference between the Mean and the 

Mastery-Criterion Score Expressed in" a Standard Score Form 
with the three Reliability Coefficients 



Test Type 


Ux-cv/sdI 


Correlations with 


t 


Mean 


SD 


KR-20 


k2(X.T) 


uc 


Pretest 


.86 




.78 


.05 


.34** 




-.10 


4.87** 


CET 


.83 




.97 


-.37** 


.42** 




-.66** 


12.87** 


Posttest 


..55 




.70 


-.24** 


.14* . 




-. 32** • 


. 4.29** 


TOTAL 


. 76 


.84 


-.16** 


. 32** 


-.4 3** 


13.92**__ 



* Significant at the .05 level. 
** Significant at the .01 level. 



According to Tables 8 and 9, k2(X,T) was consistantly highly 
correlated with the difference between the mean and the mastery- 
criterion score expressed in both percentage and standard score 
forms for all test types. Obviously , the bigger the discrepency , 



the larger Livingston's coefficient. The discrepency in percentage 
form seenis more directly related to the magnifude of K^(X,T) tlian 
when it was expressed in standard score form. It is interesting to 
note that contrary to expectancy, pc was negatively correlated with 
the discrepency expressed in standard score form, Correlation coef- 
Eicients were significantly high for the GET and posttest where the 
test variances were relatively small. 

6. Proportion of Mastery Students (Pm) 

The relationship between the proportion of m;:istery students and 
the three coefficients was investigated separately from that of PTi, 
even though they were closely correlated .(. 92 for pretest, .83 for 
GET, .78 for posttest and .93 for the combined total test data), 
because Pm has practical significance for decision-making. Obtained 
correlations are presented in Table 10. 



Table 10. Correlations of the Percent of Mastery Students with 
the Three Reliability Coefficients 



Test Type 


Pm 


Gorrelations wit 


1 


t 


Mean 


SD 


KR-20 


k2(X.T) 


uc 


Pretest 
GET 

Posttest 


54.52 
87.49 
83.97 


26.65 
11.21 
11.51 


-.22'<* 
- . 19** 
-.26** 


-.43** 
.24** 
-.01 


,21.** 
-.40** 
-.26** 


7.66** 
6.05** 
2 . 25* 


TOTAL 


73.54 


24.28 






-.15** 


3.01** 



* Significant at the o05 level. 
** Significant at the ,01 level. 



According to Table 10, KR-20 was significantly negatively 
correlated with Pm, The results seem reasonable because the in- 
crease of Pm might mean the reduction of. test' variance. In this 
regard, it does not seem appropriate to compute KR-20 for a criterion- 
referenced tes t , especially when it is administered as a GET or as 
a pos ttes t . 

■k2(x,T) did not demonstrate a consistent relationship with Pm. 
It requires further studies. The has a positive correlation for 
pretest and negative correlations for GET and posttest. When Pm 
arrives at an extreme value (O or 100%) , juc becomes zero like an 



ordinary correlation coefficient. TViere was one Pm = 100 and pc = 0 
case among the 265 pretest cases, 25 among the 208 CET's, and 11 
among the 205 postests. Obviously these extreme cases influenced 
tiio size of the correlations For the CKT and posttesl c.asos. flowo.V(jr, 
one WQuUI HtLI.1 not expect; to find h L^',niricanl: posUlvcr cor re 1 al I (jiut 
for the CKT and posttest cases even if these extreme cases were* ci I I m~ 
Inated. 

7. Shape of Score Distribution (SSD) 

Shape of score distribution is a categorical variable* According 
to Harris (1972-b), the maximum value of yuc^ is expected to vary along 
with the shape of score distribution. For symmetric distributions 
of equal range, a rectangular distribution gives a larger maximum jdc^ 
than does a normal distribution, and a U-shaped dis tribution has a 
larger maximum jac^ than does a rectangular distribution. 

Therefore, value 1 was assigned to one-point distributions, 
value 2 to a bell-shaped distribution, value 3 to a rectangular or a 
right-triangle shaped distribution with a gradual slope, value 4 to a 
J-shaped distribution, and value 5 to a very steep J~shaped distribu- 
tion with 2 or 3 entry points. Correlations of the categorical variable 
with the coefficients are presented in Table 11. 



Table 11. Correlations of the Shape of Score Distribution 
with the Three Reliability Coefficients 



Test Type 


SSD 


Correlations With 




Mean 


SD 


KR-20 


k2(X,T) 


PC 


Max uc^ 


t 


Pretest 
CET 

Pos ttest 


3.22 
3.90 
4.03 


1.28 
1.04 
1.10 


' .01 
. 36** 
. 43** 


-.15* 
.01 
.01 


.26** 
.40** 
.56** 


.07 

.55** 

.48** 


4.47** 
3.82** 
6.07** 


TOTAL 


3-68 


1.22 


• 10* 


- , 14** 


. 32** 


.33** 


8.05** 



* Significant at the .05 level. 
** Significant at the .01 level. 



The data in Table. 11 seem to support Harris' intuition with 
one exception; the correlation between SSD and maximum yuc^ is not 
•statistically significant when calculated from the pretest data. • 
The low correlation seems to have resulted from the fact that Max jic 



had a very small standard devialion. 

C. Relations of the Tliree Reliability Coefficients to I I 
and I 

CET-pos t ^ 

l!lach of the three Indices, 1 I , and L, 

pre-Chl, pre-post Chl-posl, 

actually represents a compound effect, at least, of the reliability of 
the two tests used and of the ef f ec tiveness of instruction. 'L'herefore, 
the correlation coefficients shown in Table 12 may be inflated ones. 



Table 12. Correlations of the Three Reliability Coefficients 
with the ^'pre-CET, -^re-post, and ^CET-post Indices, 



Test Type 




Index 


Correlati 


on with 


t 




// of Pair 


s Mean 


SD 


K -20 


K?(X,T) 




Pre test 


^■pre-CET 
Ipre-pos t 


206 
205 


84.85 
68.86 


13.56 
22.29 


.10. 
-.08 


-.01 

-.13* 


.15* 
. 14* 


1.57 
2.55* 


GET 


Ipre-CET 
^CET-'post 


206 
203 


84.85 
55.46 


13.56 
25.65 


-.18** 
-.08 . 


.23** 
, .17** 


-.41** 
-.17** 


6.11** 
2.96** 


Pos t tes t 


1 pre-post 
^ CET-post 


205 
203 


68.86 
55.46 


22.29 
25.65 


-.24** • 
-.26** . 


.04 
.03 


-.26** 
-.23** 


2.68** 
2.37* 



* Significant at the .05 level. 
** Significant at the .01 level. 



Table 12 shows a contrasting tendency between K^(X,T) and jjc for 

pretest and for CET and posttest. For the pretest data', /ic was positively 

correlated to I and I ^ indices. On the other hand, K^(X,T) 

pre-CET . pre-post 

was negatively correlated, though the first correlation coefficient was 
not statistically significant. However, tliis tendency was reversed for 
the. CET and posttest data: K^(X,T) was- positively correlated (though the 
correlation coefficients for the posttest data were not sLgnif Leant) , and 
)ic was significantly negatively correlated. More studies .seem necessary 
on. the relationship between the test reliability of a CRT and its actual 
classification ability of students into one of mastery and non-mastery 
categories . 
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SUMMARY AND CONCLUDING KKMARKS 



Tlui preseiU, sLudy is a part of die overall study that was designed lo 
find clues to the quetlons: (1) what kinds of reliability coefficients are 
appropriate for various criterion- referenced testing situations, and (2) 
what are the most appropriate ways of interpreting these coefficients when 
they are computed. 

Livingston's reliability coefficients and Harris' indices of 
efficiency were computed for 678 criterion-referenced tests in the A 
through E levels of I.P.I . Mathematics , Edition II . The coefficients 
were carefully studied and compared with each other and with the classi- 
cal internal consistency coefficients, KR-20's, in relation to the number 
of students, nuinber of items, percentage points of the mastery criterion 
score and the mean, the absolute value of the difference of the mean from 
the mastery criterion score expressed both in percentage and in standard 
score form, the standard deviation, the percent of the mastery students, 
the shape of the score distribution, and the mastery status indices derived 
from the cross- tabulate tables of students' performance on the pretest and 
curriculum embedded test (GET), the pretest and posttest, and the GET and 
posttest. 

Generally the means of Harris' indices were larger than those of 
Livingston's coefficients for all test types (pretest, GET and pos ttes t) . 

All three reliability, coefficients inves tigated in the present 
study were higher when a criterion-^ref erenced test was administered as a 
pretest than when it was used as a .GET or as a posttest. 

The classical internal consistency coef f icient , KR-20,was found to 
be highly, positively correlated with the standard deviation. The number of 
cases and the number of items were moderately correlated with KR-20. KR-20 
was negatively correlated with the percentage point of the mean. 

Livingston's coefficient was highly correlated with the discrepency 
between the mean and the mastery criterion score. The standard deviation was 
also highly correlated with Livingston's coefficient for pretest and posttest 
cases . 
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Wlien do.rlvfiil from f.lio prelosl: da t.a, Harris' index sliowod no h I I C I - 
caul rolatLun ld any varLahlo sludliul wit.li ilu>. o xcopl I.011 that: I.I wan imj,lr ral c 1 y , 
positively currclatod wLtli tho proportion of nwistery HLudoaLs and iho .shap.- 
of score distributions. This trend changed when criterion-referenced tests were 
given either as CET's or as posttests. Harris' index was negatively correlated 
with the discrepancy between the mean and the mastery criterion score, the 
proportion of mastery students, and interestingly enough with the number of 
items. It was positively correlated with the number of students who took the 
test. The shape of the score distribution maintained the same trend as was 
found with Harris' index based on pretest. 

As mentioned before, the present paper is only a report of the descriptive 
part of the overall study. On the basis of the data presented to date, it 
would be concluded that Karris' index is relatively stable in regard to all 
testing situations considered. Livingston's coefficient seems to require 
different standards for interpretation when it is based on data collected 
in different testing situations.. However, the present author; feels that any 
final conclusions and specific implications for the interpretation of the two 
reliability coefficients should wait until the following on-going studies are 
completed; (1) comparisons of the three coefficients in relation to each of 
the variables mentioned previously, and (2) the analyses of the relative 
amounts of the contribution each variable made in deciding the size of the 
reliability coefficients. 
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